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as a LAN (local area network). Each peer dient can 
obtain data packages from each other or from an exter- 
nal server. However, each peer dient preferably obtains 
data packages from other peer dlents. rather than 
obtaining data packages from the external server. 
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Description 

[0001 ] The present invention relates to a distributed client-tjased data caching system and method. Specifically, the 
system and method of the present invention enable data packages to be served to a client through a flexible, non-deter- 
5 ministic distributed system of peer clients which cache the data packages, In order to maximize efficiency and speed 
for sen/ing the data package to the client. 

[0002] Networks which connect two or more computers, such as the Internet or intranets, enable client computers 
to obtain data packages, such as documents, images, messages, data packages or other types of data from remote 
storage media which are not installed on the client computer itself. Instead, these remote storage media we managed 

10 and operated through a remote computer, known as a server computer or simply as a "server" (in the same vein, the 
client computer is also often termed only a "client"). The advantage of such a system is that the client computer can 
potentially obtain data from any server on the network. The disadvantage of the system is the requirement for sufficient 
bandwidth on the network to liable data to be transmitted from the server to the client. Furthermore, if the load is not 
evenly distributed between servers on the network, one server may become overwhelmed with requests, thereby 

15 decreasing the speed and efficiency of retrieval. Thus, currently many networks cannot provide rapid and efficient data 
retrieval due to the heavy demands placed upon the available bandwidth. 

[0003] Proxy servers are often installed to conserve bandwidth on an Internet connection or on connections to 
other LANs (local area networks). These proxy servers cache frequently accessed data, thereby reducing the load on 
the main server, and distributing demand for bandwidth more evenly across the network. Unfortunately, such proxy 
20 servers are typically expensive to maintain. Furthermore, proxy servers require dedicated computers to be installed and 
configured. Each computer on the LAN has to be separately configured in order to communicate with the proxy server. 
Such configuration is deterministic, such that each client must be configured to communicate with each proxy server 
separately. Thus, proxy servers have many drawbacks. 

[0004] A more useful solution would enable Intranets to reap the benefits of the proxy server, without requiring ded- 
25 icated machines and without requiring any special installation or configuration. Furthermore, such a solution would not 
be deterministic, such that each client couid communicate with more than one server according to the load on each 
server, rather than according to the configuration of the client itself. Unfortunately, such a solution is not currently avail- 
able. 

[0005] Therefore, there is an unmet need for, and it would be highly useful to have, a distributed client^ased data 

30 caching system and method which enable data to be stored and retrieved from a plurality of peer clients, or "^caching 
entities", yet which does not require any special configuration or installation of separate servers. 
[0006] The present invention is of a distributed dient-based data caching system and method, which enable data 
to be served to a client through a flexible, non-deterministic distrbuted system of caching entities, in order to maximize 
efficiency and speed for serving the document to the client The caching entities are peer clients which serve the data 

35 to each other, thereby reducing the amount of bandwidth required to obtain data from an external sender. 

[0007] According to the present invention, there is provided a method for distributing data packages across a net- 
work, the network featuring an external server for serving at least one data package, the external server being a dedi- 
cated server, the steps of the method being performed by a data processor, the method comprising the steps of: (a) 
providing a plurality of peer clients attached to the network and a list of data packages being stored by each of the plu- 

40 raiity of peer clients, each data package on the list of data packages having an entry, the entry indicating a unique iden- 
tifier fbr the data package and a location of the data package in at least one of the plurality of peer clients; (b) examining 
the list of data packages by a first peer diem to find an entry for a data package; and (c) if the entry ibr the data package 
is present on the list of data packages of the first peer dient. retrieving the data package from the location at another of 
the plurality of peer dients according to the entry for the data package. 

45 [0008] Alternatively, the list of data packages is stored on the external server. 

[0009] According to prefered embodiments of the present invention, the list of data packages is stored on at least 
the first peer dient. Preferably, if alternatively the entry for tiie data package is absent from the list of data packages of 
the first peer client the method further oonprises the steps of: (d) sending a request message for the data package by 
the first peer dient to at least one other peer client: and (e) if a response message is received by the first peer dient 

so from the at least one other peer dient retrieving the data package from the at least one other peer client by the first peer 
client. 

[0010] Preferably, the request message arxi the response message are transmitted to the plurality of peer clients 
by broadcasting. Alternatively, the request message and the response message are transmitted to tiie plurality of oeer 
clients by multicasting. Also alternatively, the request message and the response message are transmitted to the plu- 
55 raiity of peer dients by polling each peer dient individually 

[001 1 1 Also alternatively and preferably, if the response message is not received from the at least one other peer 
client by the first peer client the method further comprises the step of: (f) obtaining the data package by the first peer 
dient from the external server. Preferably, the method funher comprises tiie step of sending a response message by 
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the first peer client to the at least one other peer client substantially before the first peer client obtains the data package 
from the external sen/er. More preferably, the list of data packages is stored on each of the plurality of peer clients, and 
the method further compnses the steps of: (g). receiving the response message from the first peer client by the at least 
one other peer client: and (h) altering the list of data packages being stored by the at least one other peer client for indi- 

£ eating the location of the data package according to the response message. 

[001 2] Alternatively, the list of data packages Is stored on each of the plurality of peer clients, and the method fur- 
ther conprises the steps of: (g) receiving the response message from the first peer client by the at least one other peer 
client: and (h) altenng the list of data packages being stored by the at least one other peer cOent for indicating the loca- 
tion of the data package according to a probabilistic function. 

10 [001 3] Preferably, the probabilistic function is perfomied according to a set of equations: 



Old location Po(x) = l/(gcncration+l) 



New location Pn(x) = 1- l/(gcneration+l) 

20 

Wherein Pn(x) is a probability that the new location is substituted for the old location. Po(x) is a probability that the oW 
location is retained, and "generation" indicates how many times the location had been previously changed. 
[001 41 Also preferably, an upper limit is predetermined for a number of the plurality of peer clients served substan- 
tially simultaneously by the at least one other peer cttent. such that if a number of the plurality of peer clients served 
25 substantially simultaneously by the at least one other peer dient is greater than the upper limit, the method further com- 
prises the step of: (d) sending a busy message from the at least one other peer dient to the f Iret peer client. 
[001 5] Preferably, the external server is a Web server, and the plurality of peer dients is a plurality of Web browsers. 
[001 6] Also preferably, the extemal sewer is a BackWeb"" sender, and the plurality of peer clients is a plurality of 
BackW^'" dients. 

30 [001 7] Preferably, the unique identifier for the data package Is an MD5 digest of the data package. 

[001 8] ' According to still other preferred embodiments of the present invention, the step of retrieving the data pack- 
age is performed according to a protocol based on TCP/IP. Preferably, the protocol is HTTP Alternatively and preferably 
the protocol is FTP 

[001 9) Hereinafter, the term "protocol based on TCP/IP" includes any such protocol, inducfing but not limited to the 
35 HTTP (hypertext transfer protocol) and FTP (file transfer protocol) protocols. 

[0020] Hereinafter; the term 'data package" refers to any discrete, identifiable unit of data, including but not limited 
to documents, images, messages, data packages or any other type of data. 

[0021 1 Hereinafter, the term "computing platform" refers to a particular computer hardware system or to a particular 
software operating system. Examples of such hardware systems include, but are not limited to, personal computers 

40 (PC), Apple l\4acintosh ~ computers, mainframes, minicomputers and workstations, which are also non^imiti^g exam- 
ples of data processors tor operating a software application under an operating system. Examples of such software 
operating systems include, but are not limited to. UNIX. VMS. Unux. MacOS"*. DOS. one of the Windows™ operating 
systems by Microsoft Corp. (USA), including Windows NT~ Windows 3.x~ (In which "x" is a version number, such as 
"Windows 3.1™!. Windows95™ and Wndows98™ . 

45 [0022] f=6r the present invention, a software appiicatfon could be written in a substantially suitable programmtng 
language, which couU easily be selected by one of ordinary skill in the art. The programming language chosen should 
be compatible with the operating system according to which the software application is executed. Examples of suitable 
programming languages indude. but are not limited ta C, Cm* and Java. 
[0023] Hereinafter, the term "broadcasf may also indude "multicasT as well. 

so [0024] The invention is herein described, by way of example only, with reference to the accompanying drawings, 
wherein: 

F IGa 1 A and 1 B are schematic block diagrams of an exenplary basic system and method according to the present 
invention: 

5S FIGS. 2A-2E are schematic block diagrams of an exemplary request/response protocol and method according to 
the present invention: 

FIG. 3 is a schematic block diagram of an exemplary preferred data-flow diagram according to the present inven- 
tion: 



15 



New location 



5NS00CI0 <€P :993ie3Ai 



3 



EP 0 993 163 A1 



FIG. 4 is a flowchart of a method for operating the system of the present invention with Web browsers; and 
FIGS. 5A and 5B are exemplary request and response messages according to the present invention. 

[0025] The present invention is of a distributed dient-based data caching system and method, which enable data 
5 to be sewed to a client through a flexible, non-deterministic distributed system of caching entities, in order to maximize 
efficiency and speed for serving the data to the dient. The caching entities we peer clients which serve the data to each 
other, thereby reducing the amount of bandwidth required to obtain data from an external server. 
[0026] The system and method of the present Invention enable clients to share data packages among themselves 
across their local network neighborhood, for example within a LAN, thereby eliminating the need for a specialized proxy 
10 server. Furthemiore. the network traffk: is not significantly affected, since modem network architectures are well suited 
tor peer-to-peer communications. Most currently operating networks have a star topology, using switching hubs. In 
which communication between two peers does not affect simultaneous communication among other nodes on the net- 
work. Thus, the system of the present invention overcomes the drawbacks of a proxy server, yet does not add significant 
loads to the traffic on the network Itself. 
/5 [0027] For currently available client-server software applications known in the art. whenever a dient requires a data 
package, the following algorithm is perfomied. Rrst. the software application attempts to locate the data package locally 
on the memory or on the disk or disks of the client Then, if the data package is not found locally, the software applica- 
tion retrieves the data package from the appropriate server. 

[0028] By contrast the operation of the system of the present invention adds an intermediate step. For the present 
20 invemicrt. if the dau package is not found locally, an a t t en p t is made to retrieve the data package from a peer dient on 
the local netwKxk 'neighborhood" before resorting to retrieving the data package from the server. 
[0029] Thus, tor the system of the present invention, every client actually functions as a caching proxy. Once a dient 
recM'ds a data package, it queries all the hosts, which are actually peer clients, on the local network for that data pack- 
age If no neighboring peer client has the data package, the dient retrieves the data package from the external server 
25 as usual However, if a neighboring client already has the required data package, the requesting dient will dowvnioad 
this data package from the peer dient rather than from the external sen/er. 

[0030] The pnnapies and operation of the distributed dient-based data caching system according to the present 

invention may be better understood with reference to the drawings and the accompanying description. 

[0031] Figure 1 A is a schematic block diagram of an exemplary system accading to the present invention, while- 

30 Figure 1 B is a flowchart of the operation of the system of Figure 1 A. Figure 1 A shows a system 10 which indudes a 
plurality of peer dients 12 connected by a local network 14 of some type, for example a LAN, indicated by the heavier 
line in Figure 1 A. Two peer dients 12, labeled as "peer client 1" 20 and "peer client 2" 22. are shown for the purposes 
of illustration only and without intending to be limiting in any way Each peer dient 12 is also connected to an external 
sen/er 1 6 of some type by an external connection 18. Although only one external sen/er 1 6 is shown, a plurality of exter-^ 

35 nal servers coM also be implemented. External sen/er 16 is a dedicated server, in the sense that this sen/er has a pn- 
nrary or at least a substantially significant rde as a sen/er for data packages. As shown for the purposes of illustration, 
external connection 18 only connects to local network 14 at one point, although multiple such- external connections 
could also be implemented (not shown). In additton. external connection 18 couM also optionally connect each peer cli- 
ent 1 2 directly to server 1 8 (not shown). 

40 [0032] The operation of system 1 0 according to the present invention is illustrated with reference also to Figure 1 B. 
In step 1 . peer client 12. such as peer dient 1 2 looks for a data package in the local memoryior disk cache of that par- 
ticular peer dient 1 2. If the desired data package is not found on the local diskcache. then in step Z peer dient 1 2 que- 
ries any other peer dient(s) 12 on local network 14 to determine whether any other peer dient 12 has a particular data 
package. For example, peer client 20 coukl query peer client 22. to determine whether peer dient 22 has the desired 

45 data package. In step 3a. if peer client 22 has the desired data package, then peer client 20 obtains the data' package 
from peer client 22. Alternatively, as shown in step 3b. if peer dient 22 does not have the desired data package, then 
peer client 20 obtains the data package from sen/er 16 through external connection 18. Thus, every peer dient 12 is 
also potentially a server which is internal to local network 14. and hence could be desaibed as an "internal server" to-- 
distinguish peer client 12 from external server 16. 

50 [0033] Each peer client 1 2 couki also be described as a "caching entity' and the data stored by each client for serv- 
ing to other peer dients 12 as "cached data" or "cached data packages". 

[0034] A number of different possible embodiments of the system of the present invention can be implemented, of 
which two illustrative embodiments are shown with reference to the Figures below.. Briefly. Figures 2A-2D illustrate an 
exemolary emtxxjiment of the system of the present invention for inrplementation with the software application of Back- 
55 Web^" (BackWeb Technologies Ltd.. Ramat Can. Israel) on a local area network (LAN). Figures 4 and 5A-5B illustrate 
an exemplary embodiment of the system of the present invention for implementation with a Web browser software appli- 
cation on the Internet. 

[0035] Figure 2 A shows an exemplary local network 24 which features a plurality of peer dients 12 of which three 
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are shown for the purposes ol discussion only and without intending to be limiting in any way. For the purposes of dis- 
cussion only, suppose a peer client 26. labeled "A", wishes to obtain tour data packages "W. "X". 'V and "Z" None of 
these data packages are local to peer dient 26. which must therefore obtain these data packages from either another 
peer d.ent 12 as an internal server, or from an external server (not shown). Local area network 24 features two other 
peer clients 12: peer dient 28, labeled "B". and peer client 30, labeled "C". Peer client 26 must therefore first communi- 
cate a request to peer client 28 and peer client 30 to see if the desired data packages are available at either location, 
and then peer client 26 must obtain these data padoges from peer client 28 or peer client 30 if the desired data pad<- 
ages are available. 

[0036] Preferably, two protocols are used for communication between peer dients on a local area network (LAN) a 
datapackage-exchange protocol and a control protocol. SpecHically. the data padege exchange protocol is used'to 
transfer data pad<ages between peer dients. once the desired data pad«ge has been located, and is described in 
greater detail with respect to Figure 2B bekw. The control protocol enables eadi peer dient to effldentiy build and 
maintain tables which desaibe the location of available data padoges across the local area network by exchanging 
messages. 

[0037] Each peer ciient maintains two hash-tabes which contain information aiaout data package location- a local- 
data packages table and a network-data packages table. The iocalKlata packages table is a hash-table of data pack- 
ages which reside on the storage medium or media of the peer dient itself. The network-data packages table is a hash- 
table of data packages which reskie on the storage medium or media of other clients on the local network This table 
contains the local area network address of the peer dient on which each data package is being stored. The size of this 
hash-table is preferably limited in order to reduce memory consumption. More preferably, each entry in the table has a 
time-stamp, such that older entries are purged when the size of the table exceeds the upper permissible limit 
[0038] In order to effectively identify the desired data padoge. preferably each data padcage has unique identifier 
or -f ingerpnnf associated with it More preferably, this unique identifier is an MD5 digest of the content of the data pad<. 
age (for a desaiption of the MD5 specification, which is an industry standard and wouU therefore be obvious to one of 
2S ordinary skill in the art. see "RFC 132r at http://ds.internic.netMc/rfc1321M). 

[0039] Once any peer client knows both the unique identifier and the location of the data package on the local net- 
work, that dient can then proceed to download the data package. However, the peer dient may not know the locatfen 
of the desired data padcage. in which case the client must follow a control protocol accoiding to the present invention 
in order to determine the location of the desired data padcage and to enable the dient to buiM these hash tables with 
respect to future attempts to locate a data padcage. 

[0040J The control protocol is used to provide each dient with knowledge about the locations of data packages 
across the local network, in the prefen-ed implementation illustrated with respect to Figures 2A-2D. control messages 
are preferably sem and received as broadcast or multicast pad<ets. Local area networks such as Ethernet networks 
support broadcast or multicast packets such that all peer dients on a local area network receive the broadcast or mul- 
ticast pad^ts. Effectively, a single packet can be sent to all peer dients by using broadcast or multicast, thereby reduc- 
ing the amount of traffic on the nelwak required as a result of transmitting the request message (see for example 
Chapter 12. "Broadcasting and Multicasting: of TCP/IP Ulustrated Volume, by W. Richard Stevens. Addison-Wesley, 
1994). However, optionally the system of the present invention couW poll each peer dient individuaily with a control 
message for that peer dient. although this is not preferred since such individually addressed messages would consume • 
excessive amounts of available bandwidth. In such a situation, preferably polling would be restriaed to a certain group - 
of peer dients as internal servers, in order to reduce the amount of traffk: on the local area network. . 
[0041] For the preferred implementation in which broadcast a multicast is used, more preferably, the dedsion to 
select either IP multwast or broadcast is made accordng to the configuration set by the network administrator for the 
local area network. IP multicast is preferable in terms of kDad on the clients of the local network, but may not be sup- 
45 ported on ail platforms (operating systems). More preferably, the TTL or Time to Uve may be configured. The^TTL spec- 
ifies the number of routers a packet can cross before being dropped. Configuring the TTL enables data package sharing 
to be expanded across subnet boundarie& 

[0042] As Shown with respect to Figure 2B. the control protocol of the present invention preferably operates as fol- 
lows. In step 1. peer dient'A" from Rgure 2A looks for a data package on the local storage medium or media. In step 
so 2. since the data package was not found locally on the medium or media of peer dient "A", peer dient "A" must down- 
load the data package and therefore preferably muWcasts (or alternatively broadcasts) a request messaga A request 
message preferably contains a protocol identifying version number (PVN) for the control protocol of the present inven- 
tion and a list of M05 digests of the needed data packages, as shown in Rgure 2C. 

[0043] Optionally and preferably, if more than one data package is desired, a list of requested data packages is 
55 induded in the request message rather than a single MD5 digest, in order to reduce the total nun*)er of request mes- 
sages on the network. 

[0044J In step 3, the neighboring clients, shown as peer dients "B* and "C" in Rgure 2A. receive this request mes- 
sage and search tor the requested data package in their local-data packages hash-table. A peer dient which does not 
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find the data package locally does not reply, as shown in step 4a. Otherwise. In step 4b the peer client sends a response 
message, preferably after wailing a short random time interval to determine whether another peer client will respond 
first More preferably, the peer client does not distribute the response message if another client responded previously, in 
order to reduce unnecessary traffic on the local area network. Also more preferably, the peer dient distributes the 

5 response message by broadcast or multicast. 

(0045] For example, as shown in Figure 2A, if peer client "A" requests a data package 'W". peer client "B" would 
reply with the response message, since peer client "B" has the data package stored locally. By contrast peer client "C" 
would not reply with a response message, since peer clientX" does not have data package "W" stored locally. On the 
other hand, if peer client "A" requests a data package "X". both peer client "8" and peer client "C" could respond. In this 

10 situation, preferably only peer client "B" or peer client "C would respond, depending on which peer client had the 
shorter random interval tor waiting betore sending the response message 

[0046] More preferably, responses are sent only for data packages with yet unknown locations. For example, sup- 
pose client "A" requests data packages "W. "X", T" and "Zr. Client "B" has data packages "W". "X" and "Y", and is the 
first to reply, with a reply message indicating possession of data packages "W", "X" and "Y". Suppose another client X" 

IS has data packages "X", "Y" and "Z". Since it replied after client "8". the response message from client "C" will only indi- 
cate possession of data package 7 because this is the only data package with an as yet unknown location. 
[0047] A response message optionally contains the identifying PVN. the list of MD5 digests of data packages that 
were found and a TCP port number, as shown in Rgure 2D. The port number identities on which TCP port the respond- 
ing peer client is waiting for data package requests. Alternatively the response message optionally contains other indi- 

20 cators which enable the requesting client to retrieve one or more data packages from the responding peer. Preferably, 
response messages are also be broadcast for data packages which are cun^ently being downloaded from an external 
server, for reasons described in greater detail betow. 

[0048] In step 5. the peer client downloads the data package or data packages. In principle, according to a relatveiy 
simple embodiment of the present invention, at this stage the requesting client either receives a reply and downloads 

25 the data packages from the dient that replied: or, if a reply is not received within a certain period of time, proceeds to 
download these data packages from an external server If the peer client is downloading a data package from another 
peer client as an internal server, the data package-exchange protocol is used to obtain the data package. The data 
package*exchange protocol is based on some appropriate peer-to-peer communication protocol, including but not lim- 
ited to the HTTP protocol (see RFC-2068. "Hypertext Transfer Protocol - HTTP/1, r, available from httD-7/ds.inter- 

30 nic.net/rfc/rfc2068.txt as of September 23. 1 998). 

[0049] Preferably, a more complex implementation is employed, since such a simple implementation may cause 
multiple clients to fetch the same data packages from the external server simultaneously This situation woukj arise if 
several peer dients need to download the same data packages at approximately the same time, which is a very prob- 
able scenario for push dients tor which content delivery is triggered by an external server, since none of these clients 

35 would receive a response to its request Instead, the other dients would still be downloading the data package when 
the new client request is broadcast such that none of them would be ready to serve these data packages. Thus, many 
or a/en all of the dients wouki attempt to retrieve the data package from the external sen/er and not from another peer 
dient. thereby increasing the amount of traffic on the network and reducing the effidency of operation of the system of 
the present tnventioa 

40 [0050] Preferably, the problem is sctved by notifying other dients when a first client is downloading the data pack- 
age from the external server, even if the process of retrieving the data package is not yet complete. In this preferred 
entxxjiment. the first client which requires the data package obtains the data package from the external server. Other 
clients which require the data package will then download it from the first dient even if the first dient is still in the process 
of retrieving the data package from the external server. The preferred errbodiment of the method of the present inven- 

45 tion is described in greater detail with regard to Figure 2E. 

[0051] in step 1 . the requesting client again transmits the request again preferably by broadcasting or multicasting, 
and then waits for a response. If no response is received within a certain period of time, in step 2 the dient transmits a 
response message as if replying to its own request indicating that this dient either has the data package, or in thiS ' 
case, that the client is retrieving the data package. In step 3. the client retrieves the data package from the external 

50 server. 

[0052] In step 4. other dierrts aeate an entry in their network data packages hash table, indicating the location of 
the dient which will be serving the data package. Thus, preferably only a single dient accesses the external server for 
any given data package. 

[0053] If a request is sent tor multiple data packages, but a response is received indicating the locatton of only some 
55 of the data packages at a neighboring peer client or clients, the dient first obtains these data packages from the neigh- 
boring peer client or dients. Next the dient then transmits the response message for the rest of the data packages, and 
proceeds to obtain the rennaining data package or data packages from the external server Thus, the client only obtains 
the data package or data packages from the external server which are not available locally, rather than obtaining all of 
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the data packages from the external server, thereby reducing network traffic. 

[0054] According to prefen-ed embodiments of the present invention, preferably the process of downloading data 
package from peer clients is optimized to reduce the amount of time required for downloading, the load on each indi- 
vidual client and the overall network traffic. Such optimization is performed as follows. 

5 [0055] First, preferably the exit degree of each client is bound, such that each client is only able to sen^e a fixed, 
limited number of other clients simultaneously. More preferably, the default limit is three other clients, for example, or 
some another appropriate number which is preferably configured by the user or by the network administrator. If an addi- 
tional client attempts to download a data package from a dient which is already serving the maximum number of other 
clients will receive a tDusy" message. This feature limits the load on each individual dient. 

10 [0056] Also preferably, the present invention is able to optimize the selection of the best client from which the data 
package should be obtained. For example, if client "A^ had already downloaded a larger portion of the required data 
package than client "B", transtening the data package from client "A" is more optimal. Such clients are preferentially 
selected to serve data packages, since these clients will be able to serve the data package after a shorter time period 
has elapsed. Such preferential selection occurs by shortening the time period for waiting before these dients respond, 

IS thereby increasing the likelihood that they will serve the data packages. For this reason, the dient preferably calculates 
the random d^ay before responding such that the delay is inversely proportional to the percentage of the data package 
which has been already downloaded. In addition, the random delay is preferably proportional to the number of cfients 
being served at the moment in order to decrease tiie Gkelihood of overloading already busy clients. 
[0057] in addition, according to other prefen^i embodiments off the present invention, preferably the entries of the 

20 locations of data packages in the network data packages table are updated according to a probabilistic function. Such 
a function is preferred in order to prevent all of the dients from registering a single dient as the sen/er for any particular 
data package, for exanple. When different dients respond, usually at different times, indicating they have a specific 
data package, ttie remaining dienis listening across the network update the entry for tiiis data package in their network 
data packages table, by adding the IP address, or some other type off address according to the addressing system 

25 employed by the network, off the dient which can serve the data package to this table. In a simple implementation, the 
clients would store only the last advertised location of each data package, and tiierefbre many or all dients might 
attenpt to obtain tiie data package from a single client as the internal server, thereby overloading that dient. 
[0058] To avoki this situation, preferably tiie following probabilistic algoritinm is used to determine tiie particular di- 
ent address which is stored in the network data packages table. Each time a new client transmits a response message. 

20 indicating that this client is able to sen<e a particular data package, the probability that the new IP address of thmew 
dient is substituted for the oU IP address is calculated according to the following equations: ^ 

Old IP address Po(x)= I/(gcnenuioiH-l) 



New IP address Pn(x) = M/(generatioQ+I) 

40 ...... 

wherein Pn(x) is the probability that anew IP address is substituted for the okl IP address. Po(x) is the probability that 
the old IP address is r^ained, and 'generation" is a number indicating how many times this address had been previ- 
ously changed. 

45 [0059] For example, if dient "A" responds indicating it has data package "X". then initially all other peer dients store 
the IP address of dient "A" as the location of data package "X". If client "B" then broadcasts a response also indicating- 
that dient "3" has data package "X", then the probability ttiat any one dient now changes the IP address for the location 
of data package "X" is 50%. In other wofds. about half of the dients should now. point to dient "A" and about half shoukj. 
point to dient 

so [0060] Such a substantially even distribution of load across multiple dients shoM produce data-flow with a tree- 
shaped topology, as shown in Rgure 3, rather than a random topology, thus optimizing the average download time and 
the load on the serving clients. 

[0061] Furthermore, if any dient requests a particular data package during the period required by dient" A" for 
downloading that package. prefaaUy dient "A" sends a broadcast or multicast message indicating that the package is 
55 in the process of being downloaded. Therefore, preferably only a single client "B"* polls dient "A" for each data package, 
fa example. Other dients preferably automatically receive any responses from that polling action though the broadcast 
or multicast transmission, and thus will not be forced to poll for themselves. 

[0062] The polling (reques^response) traffic is optimized since there is usually no need to transmit both a request 
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and a response for each data package needed by each client. Such optimization is possible since each client preferably 
receives substantially all of the request/response communication of all the other clients and "remembers" the location 
of the data packages In the network-data packages table. 

[0063] As previously described, the actual process for receiving a data package from an internal server is per- 
5 formed according to the data package exchange protocol, by using the HTTP protocol or some other suitable peer-to- 
peer communication protocol. The data package transfer software application of the present invention preferably fea- 
tures a timer, for detection of an aborted transfer or a very slow data package transfer, for example. The timer deter- 
mines when such a transfer has timed out. If a time-out occurs, the requesting client preferably repeats the whole 
process. If the transfer remains unsuccessful after a plurality of attempts, the client preferably ceases to attempt to 
10 transfer the data package from the peer client as the Internal server, and instead transfers the data package or data 
packages directly from the external server. 

[0064J Again, as desaibed previously, if a requested data package has not yet finished being downloaded by a peer 
client the requesting client receives a message indicating that the data package is not ready, as well as an indication of 
the fraction of the data package already downloaded. The requesting client continues polling the serving client until the 
15 data package download is complete. If the download becomes substantiaUy slower or is othenwise interripted or termi- 
nated for a long period of time, the requesting client behaves as if a time-out occurred. 

[0065] According to additonal prefen-ed features of the present Invention, substantially automatic detection of peer 
clients is supported. Such automatk: detection enables each peer client to detect the presence of other peer clients on 
the network. If such peer clients are not found, preferably the system of the present invention is disabled, since the oper- 
20 ation of the system as described above would only prolong the time period required to download a data package if no 
other peer clients are available. 

[0066] Preferably, the amount of bandwidth on the local area network which is consumed by each peer client serv- 
ing data packages to other clients is limited, to avoid over-burdening any specific host This limit is preferably configura- 
ble by the user or by the network administrata. 

25 [0067] Furthermore. In order to protect peer clients from unauthorized access of local storage media through the 
system of the present invention, certain security features are preferably included. For example, preferably only data 
packages identified in the hash tables are able to be transferred from the dient Thus, transmitted data packages are 
preferably only data packages whk:h were intended to be served to the peer clients.* such that malicious users prefera- 
bly cannot use the system of the present invention to ctitasn "random" data packages from the storage media of a peer 

30 client. Data packages are more preferably only referenced by their unique identifier, such as their 1 28-brt MD5 digest 
such that a data package is only able to be downloaded from a client if the intended recipient knows this digest. Thus, 
the name of a data package alone is preferably not sufficient information to permit retrieval of thedata package from a 
peer client. 

[0068] According to another embodiment of the present invention, the system of the present invention is also appli- 
35 cable to Web browsers. FTP clients^ and other software applications involving diem-server data^transfer. As described 
with reference to Rgures 4 and 5A-5B. another exemplary embodiment of the present invention is used for caching Web 
content. 

[0069] In step 1 of Rgure 4. a Web browser being operated by a client computer requests a specific data package. 
First the Web browser looks at the local cache, as is known to one of ordinary skill in the art If the data package is found 
40 in the local cache, then that data package is retrieved from the local cache. Otherwise, the Web browser issues a mes- 
sage requesting this data package, preferably by using broadcast or multicast message transmission. The data pack- 
age is preferably uniquely defined by a unique identifier. More preferably, the unique identifier is the URL of the data 
package, or alternatively and preferably a combination of the URL of the data package and timestarnp. or by any other 
suitable unique identifier. 

45 [0070] For optimization, if more than one data package is required, the Web browser preferably transmits one 
request message containing the list of needed data packages, thereby reducing the total networi< traffic aaoss the net- 
work. Such a situation may arise if. for example, the Web browser had just parsed an HTML (hypertext mark-up lan- 
guage) document, or Web page, which contains many links to follow. Preferably and optionally, each request message" 
contains an identifying "magic nunrt>er*. which may contain the protocol version (PVN). For instance: "VI .0". As shown 

so in Figure 5A. the request message includes the list of URL's or other unique identifiers to identify the data package or 
data packages being requested, which is similar in function to the list of M05 digests described previously for request 
messages, and a unique iderrtifier identifying the request message, stiown as "REQ". 

[0071 ] In step 2. other Web browsers across the network listen to detect request messages of this type. These Web 
browsers, which are peer dients for this embodiment of the present invention, receive this request message and check 
55 their own cache for the requested URL If the requested URL is found in the local cache of a Web browser, that Web 
browser preferably waits a random interval and then preferably transmits a response message indicating it has the 
required data package (or data packages). Preferably, the message is broadcast or multicast More preferably, that Web 
browser does not reply if another Web browser had replied f ^ A reply message is preferably sent by a particular Web 
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browser even if the requested URL is still being downloaded by that Web browser. 

[0072] in step 3. if no response to an issued request message is received within a certain amount of time, for exam- 
ple 5 seconds, then the process ts preferably timed out. In this case, the Web browser preferably no longer attempts to 
obtain the URL from another Web browser, and the URL is obtained from the regular Web sender using regular HTTP 
protocol. Before starting to download the data package from the regular Web server, the Web browser preferably trans- 
mits a response message indicating that this particular Web browser is downloading the data package. 
[00731 On the other hand, if a response message is received, the Web browser obtains the URL from the other Web 
browser which indicated that it had the URL in the local cache. Preferably. Web browsers across the network record the 
URU and the address from which the response message originated for future use, such that these Web browsers 
would be able to download the URL at a future time without first transmitting the request message 
[0074J Once the Web browser is able to locate a data package on a neighboring Web browser, the Web browser 
attempts to download the data package The downloading process is performed with a suitable data-transfer protocol, 
such as HTTP or FTP. If a time-out or other failure occurs during the processing of data package transfer, the receiving 
Web browser preferably performs substantially the entire procedure more than once. More preferably, ttie number of 
permitted attempts to retry the transfer is configurable. If the process fails after these attempts have been performed, 
preferably the Web browser transfers the required data package or data packages from the regular Web server. 
[0075] According to prefenred features of this embodiment of the present invention, data package downloading is 
well distributed, such that the Web browsers do not obtain a data package from only a single Web browser, but rather 
obtain the data package from a plurality of Web browsers. Such distribution is maintained as follows. 
[0076] Rrst. preferably the number of simultaneous data package transfers from a single Web browser is limited. If 
this number is exceeded, the Web browser transmits a "busy" message to other Web browsers attempting to transfer 
the data package. Next, preferably once a Web browser receives a message giving the location of a particular data 
package, the corresponding entry in the hash table fa that data package is not altered every time another response 
message is received pertaining to this data package. The hash table is preferably altered by subsequent messages in 
a probabilistic manner, such that tiie probability that any particular entry is updated to indicate a new location of a data 
package Is equal to l/(generationfi). where 'generation* counts the number of times a response message was received 
for that data package. 

[0077] For example, if Web browser "A" transmits a response message indicating that data package "X" is on the 
local cache; then initially all of tiie neighboring Web browsers have an entry in the hash table indicating that Web 
browser "A- is the locatran of data package -X^ if Web browser "B* then transmits a response message for data pack- 
age."X". tfien each Web browser preferably now alters the entry In the hash table to indicate a new location of data 
package *X" with a probability of about fifty percent, such that about fifty percent of tine Web browsers now have an 
entry indicating that the data package is available from Web browser "A" and such that about fifty percent of the Web 
browsers now have an entry indicating that tiie data package is available from Web browser "B". Thus, a good load dis- 
tribution can be achieved. 

[0078] The random delay (mentioned in step 2 above) chosen by a browser is proportional to the number of cur- 
rerniy served browsers, or tiie number of browsers currently downtoading data packages from ttiat browser, and 
inversely proportional to the amount of the data package already downloaded by it. This way the browsers more eligible 
to download from are more likely to be chosen by other browsers to serve these data packages. 
[0079] While the invention has been described with respect to a limited number of embodiments, it will be appreci- 
ated diat many varlattons, modifications and other applications of ttie invention may be made. 

Claims 

1. A metfiod tor disti-ibtrting data packages across a network, the network featuring an external server for serving at 
least one data package, the external server being a dedicated server, the steps of tiie method being performed by 
a data processor, the method comprising tiie steps of: 

(a) providing a plurality of peer clients attached to the network and providing a list of data packages. sakJ data 
packages being stored by each of said plurality of peer clients, each data package of said data packages hav- 
ing an entry in sad list, said emry indicating a unique identifier for said data package and a location of said data 
package in at least one of said plurality of peer dients: 

(b) exanining said list of data packages by a first peer client to find an entry tor a required data package: and 

(c) if said entry tor said data package is present on said list of data packages of said first peer client, retrieving 
said data package from saxt location at another of said plurality of peer clients according to said entry tor sakj 
data package. 

2. The method of daim 1 . wherein said list of data packages is stored on at least sakj first peer dient 
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3. The method of daim 2. wherein alternatively said entry for said data package is absent from said list of data pack- 
ages of said first peer client, the method further comprising the steps of: 

(d) sending a request message for said data package by said first peer client to at least one other peer client: 
5 and 

(e) if a response message is received by said first peer client from said at least one other peer client retrieving 
said data package from sakl at least one other peer client by sakl first peer client 

4. The method of daim 3. the method further comprising the step* of: 

10 

(f) altering sakJ list of data packages being stored by at least said first peer client for indicating said location of 
said data package according to said response message. 

5. The method of claim 4, wherein rf said response message is not received from said at least one other peer dient 
15 by saki first peer client the method further comprises the step of: 

(g) obtaining said data package by said first peer dient from the external server. 

6. The method of daim 5, further comprising the step of sending a response message by said first peer client to said 
20 at least one other peer dient substantially before sakd first peer client obtains sakj data package from the external 

server. 

7. The method of daim 6. wherein ssad list of data packages Is stored on each of said plurality of peer clients, the 
method further comprising the steps of: 

2S 

(h) receiving said response message from said first peer client by said at least one other peer client: and 

(i) altering sakj list of data packages being stored by said at least one other peer dient for indicating sakl loca- 
tion of saki data package according to sakj response message. 

30 8. The method of daim 5. wherein sakj list of data packages is stored on each of sakj plurality of peer clients, the 
method further comprising the steps of: 

(h) receiving said response message from sakj first peer client by said at least one other peer client: and 

(i) altering sakj list of data packages being stored by sakl at least one other peer dient for indicating sakj loca- 
35 tion of said data package according to a probabilistic function. 

9. The metfiod of claim 1 , wherein an upper limit is predetermined for a number of sakj plurality of peer dients served 
suksstantiaily simultaneously by sakj at least one other peer dient. such that if a number of said plurality of peer cli- 
ents served substantially sinrultaneously by said at least one other peer dient is greater than sakj upper limit, the 

40 method further comprises the step of: 

(d) sending a busy message from said at least one other peer client to said first peer client. 

10. The method of daim 1, wherein the external server is a BackWeb"* server, and said plurality of peer clients is a 
45 plurality of BackWeb"* dients. 

1 1. A system for distrikxiting data packages across a network according to a list of the data packages, the system com- 
prising: 

50 (a) an external server for sennng at least one data package, said external server being attached to the networtc; 

and 

(b) a plurality of peer dients attached to the network, the data packages being stored by each of sakj plurality 
of peer dients. each data package of sakj data packages having an entry in the list, sakj entry indicating a 
unque identifier for said data package and a location of said data package in at least one of sakj plurality of 
55 peer dients. such that each peer dient retrieves a data package according to the list, each peer client first 

attempting to retrieve sakj data package from another of sakj plurality of peer dients. and alternatively retriev- 
ing sakj data package from sakj external server if sakj data package is not available from another of sakj plu- 
rality of peer clients. 
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